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SOME RELATIONS BETWEEN MUTUAL INFORMATION AND 
ESTIMATION ERROR IN WIENER SPACE 



By Eddy Mayer- Wolf 1 and Moshe Zakai 
Technion 

The model considered is that of "signal plus white noise." Known 
connections between the noncausal filtering error and mutual infor- 
mation are combined with new ones involving the causal estimation 
error, in a general abstract setup. The results are shown to be in- 
variant under a wide class of causality patterns; they are applied to 
the derivation of the causal estimation error of a Gaussian nonsta- 
tionary filtering problem and to a multidimensional extension of the 
Yovits- Jackson formula. 



1. Introduction. The classical "additive Gaussian channel" model con- 
sists of an m-dimensional "white noise" {nt,t £ [0, T]}, an m-dimensional 
(not necessarily Gaussian) independent "signal process" {xt, t £ [0, T]} and 
the "received signal" yt = ypyxt + nt , where 7 is the signal to noise param- 
eter. (It also deals with the stationary version where [0, T] is replaced by 
(—00,00) and xt is assumed to be a stationary process.) In the context of 
filtering theory, the main entities are the noncausal estimate and its associ- 
ated estimation mean square error 

(1.1) e\ 1 )= [ T E\x t -E(x t \y 9 ,ee[0,T])\ 2 dt 

Jo 

as well as the causal estimate and its associated filtering mean square error 

(1.2) e 2 { 1 )= E\x t -E(x t \y e ,ee[0,t])\ 2 dt. 

Jo 

Another aspect of the white Gaussian channel is the "mutual information" 
I(x,y) between the signal process and the received message defined by 

< L3 > ™ 
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where the argument of the logarithm is the Radon-Nikodym derivative be- 
tween the joint measure of x. and y. and the product measure induced by x. 
and y.. This notion was introduced by Shannon and is essential in the defi- 
nition of channel capacity, which in turn determines the possibility of trans- 
mitting signals through the channel with arbitrarily small error. The mutual 
information between "random objects" has been thoroughly analyzed and 
explicit results have been obtained, particularly for Gaussian signals and 
noise (cf. [7]). 

Recently, Guo, Shamai and Verdu [3] derived interesting new results for 
the Gaussian channel relating the mutual information with the noncausal 
estimation error. These results were extended in [13] to include the abstract 
Wiener space setup, thus extending considerably the applicability of the 
new relations. As for the causal estimation problem, some general results 
are known, starting with the Yovits-Jackson formula [12], see Snyders [8, 9] 
for further results in this direction. Moreover, the relation between mutual 
information and the causal error appeared in the literature in the early 1970s 
[1, 5]. The possibility of extending these results to the abstract Wiener space 
was pointed out in [13]. 

The purpose of this paper is to consider the "noise" as a general Gaussian 
random vector and to establish connections between the causal estimation 
error and mutual information in this abstract setting. In addition, some 
new consequences of these connections are obtained, such as the concav- 
ity of the causal estimation error as a function of the noise-to-signal ratio 
(Corollary 3.3) as well as an explicit expression for the causal error in the 
estimation of a general (not necessarily stationary) Gaussian signal (Theo- 
rem 4.1), from which the Yovits-Jackson formula for a stationary Gaussian 
signal process follows quite directly (Proposition 4.3). 

The context of an abstract Wiener space, apart from its intrinsic elegance, 
accommodates a wide range of signal models involving, for example, vector 
valued processes time reversed in some of its coordinates. We feel that this 
flexibility justifies the inclusion of the necessary abstract and sometimes 
tedious Wiener space analysis background in Section 2 and Section 3.1. On 
the other hand, as pointed out in the next section, the main results can 
also be of value to the reader who prefers to interpret their ingredients as 
concrete one dimensional processes. 

We now outline the contents of this paper. In the next section the basic 
abstract Gaussian channel setup is introduced and some preliminary adapt- 
edness results in the associated abstract Wiener space are established. In 
Section 3 the results of [1] and [5] are extended to the abstract Wiener 
space which, however, does not have any intrinsic notion of causality. Ac- 
cordingly, it is equipped with a time structure by adding an appropriate 
"chain of projections" (namely, a continuous increasing resolution of the 
identity). It turns out that the causal estimation error is independent of the 
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particular choice of the chain of projections, and is closely related to the 
mutual information I(x,y). Moreover, this relation persists when the inde- 
pendence assumption between the signal x and the noise n is relaxed to allow 
for nonanticipative dependence, as in [5]. These results when combined with 
the earlier results on the nonadapted error yield a direct relation between the 
causal and noncausal errors. In Section 4 we derive the formulae alluded to 
in the previous paragraph, namely e 2 (7) = 7 _1 X^l°g(l + \l) f° r a Gaus- 
sian process xt on [0, T] whose correlation function has an eigenfunction 
expansion ^ \iipi(s)(pj(t), and the multidimensional version of the Yovits- 
Jackson formula e 2 (7) = (2-7T7) -1 /f^ logdet(I + 7<t(£)) d£ for a stationary 
Gaussian signal with (matricial) spectral density a. 

2. Preliminaries. This work studies the basic signal plus noise model, 
which will now be formally described, modeled on the abstract Wiener space 
to allow for maximal generality as mentioned in the Introduction. However, 
many of the paper's statements — including Theorem 3.1, Corollary 3.3 and 
the contents of Section 4 — can be appreciated even in the simplest instance 
[cf. with (2.5)] 

(2.1) yt = u t + w t , 0<t<T 

(where the noise is represented by the Brownian motion {wt} and, at each 
t£[Q,T], the signal Ut depends at most on a "hidden" process {xt} in- 
dependent of {wt} and, via feedback, on y's "past" {y s ,0 < s <t}, i.e., 
ut = U (xq, Vq)), without the need to master the details of the abstract setup 
whose data we now list: 

Ml. A complete filtered probability space (0, J 7 , {JT t , < t < 1}, P). 

M2. A random variable x defined on (0, T, P) taking its values in a Polish 
space X and inducing on it its image measure /x x . 

M3. A centered nondegenerate Gaussian random variable w defined on 
(0,JT,P), independent of x, taking values in a Banach space f2 with im- 
age measure /U w , and separable associated reproducing kernel Hilbert 
space H. The non-degeneracy assumption means that H is densely 
embedded in namely, (£l,H, // w ) is an abstract Wiener space and 

n* c>h c>a 

M4. A time structure on (f2, H, /i w ) in the form of a continuous strictly in- 
creasing coherent resolution of the identity {irt, < t < 1} of H, namely 
a (continuous, increasing) family of orthogonal projections on H rang- 
ing from ttq = Oh to 717 = Id//, such that 7r t S7* C fi* and n(^,^tl)n* is 
.^-adapted, for all < t < 1 and / G fT. 

With such a time structure one can mimic the standard resolution of identity 
(ftth). = h.M in classical Wiener space C [0, 1] (in fact (cf. [11], Theorem 5.1) 
any abstract Wiener space thus equipped with a resolution of the identity 
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is equivalent "in a suitable sense" to C ([0,l];R d ), for some d G NU oo. This 
will not be used in the sequel): 

(i) Any fi-valued random variable z induces a filtration {J-f,0 < t < 1} 
in (9,JF) " 

(2.2) ^ t z = a(n(z,7r t /) n .,/G^), < i < 1. 

(The above adaptedness requirement can be expressed as ^". w C T. ). 

(ii) Given a generic subfiltration {<5<,0 < £ < 1} of {Ft,® < t < 1}, an 
O-valued random variable z is said to be (ir.,G.)- adapted if n{z,TTfl)n* is 
^-measurable, for all < i < 1 and / G il*. Examples of (7r., C/.)-adapted 
random variables are provided in increasing generality, for a partition {0 = 
to < • • • < t n = 1} of [0, 1] (and with 7r* := 7r t — 7r s ) by 

n-l 

(2.3) h = ^a fc / lfc , a^L^e.ft^j./i^^ 1 ^), 

fc=0 
n-l 

(2.4) h = £ h fc , hfc G L 2 (Q,Gt k , P; < fc+1 (# )), 

fc=0 

(hi) A mapping (7 : — > Q is ir.-nonanticipative if g(z) is (7r.,.F. z )-adapted 
for any O-valued z, that is, if £i{g(z),irtl)n* is -^"f -measurable for all such z, 
Z G 0* and 0< i< 1. 

M5. A jointly measurable mapping Z7 :X xQ —> H, 7r.-nonanticipative in its 
second variable, and a pair of ^-".-adapted random variables u G L 2 (P; H) 
and y (Q-valued) which satisfy the simultaneous equations 

f y = u + w, „ 

Equivalently {(u^, y^), x G X} is an ^".-adapted -ffxQ-valued random 
field with u x G L 2 (P;H) fi x -a.s., and which satisfy 

the connection between (2.5) and (2.6) being u(6>) = u x (9)\ x=x . 

We now present for later use two facts related to the objects introduced 
above. 

Lemma 2.1. For any h,k G H , the function m(t) := (h,irtk)H is contin- 
uous and has bounded variation on [0, 1] . 

Proof. The continuity of m follows from that of t — > Tit- In addition, 
m(t) = (ir t h,ir t k) H = + k)f H - \\ir t (h - k)\\ 2 H ) 
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so that m has bounded variation, being the difference of two increasing 
functions. □ 

Lemma 2.2. The random variables of the form (2.3) [thus those of the 
form (2.4) as well] generate the same a-algebra as the one generated by 
the family of all (tt.,Q.)- adapted random variables. This a-algebra will be 
denoted A n . : g. . 

Proof. By density arguments it suffices to check that is the only 
(71"., Q.) -adapted element u in L 2 (P;H) orthogonal to all the random vari- 
ables of form (2.3). Indeed, for any s < t in [0,1] and h E H and a G 

= E(a(-K t - TT s )h,u)H = Ea{{-K t h,u)H - (n s h, u)h)- 

This means that (nth, u)h is a (continuous) martingale, which in addition 
has zero bounded variation a.s., by Lemma 2.1. Since it is a.s. for t = 0, 
the same is true for t = 1, and since h G H is arbitrary it follows that u = 0. 
□ 

We shall be concerned with the causal and noncausal least mean square 
estimators 

(2.7) & = E(h\A n .^y) and h y = E(h\J^ ) 

of an ff-valued random variable h G L 2 (P;H), typically h = u or h = x 
(the notation A n jry was introduced in Lemma 2.2). A central theme of this 
paper is the relation between their respective associated mean square errors 
E\h — h. y \jzj and E\h — h y |^ with the mutual information between x and y, 
now to be defined. 

Mutual information. The following definition applies for two general ran- 
dom variables x and y defined on a common probability space, the latter 
taking values in a Polish space so that y's regular conditional probability 
measure /i y | x conditioned on x is well defined. In our case, where x is given 
in M2 and y by the equations (2.5), the key observation is that ^ y | x can 
be expressed in terms of the image measures fi yx of the elements y x ,x £ X, 
introduced in (2.6): 

(2.8) /x y | x = /i y J x=x P-a.s. 

Definition 2.3. The mutual information between x and y is defined 
to be 

(2.9) /(x; y) = l E ( lo § ^ (*)) > if « % , ^-s. 

1 00, otherwise. 
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Despite (2.9)'s apparent asymmetry, it turns out that J(x;y) = J(y;x). 
In fact, the identities ^j^p = ^ffj) = f(x)f{y) g eneranze easily beyond finite 
dimensions: the following fact is well known and its proof is straightforward. 

Lemma 2.4. 

/u y | x </i y , /x x - a.s. A t x|y<A t x, fiy-a.s. 

and u>/ien one and i/ins all of these hold, (y) = ( x ) = gn^u ( x > y) 
P-a.s. 

Whenever valid (i.e., as long as one does not get oo — oo) it will be con- 
venient to write 

(2.10) I(x; y) = E log ^ (y) - £ log ^ (y) , 

since both terms in the difference can be derived from a generalized Girsanov 
theorem. 

3. The connection between estimation errors and mutual information. 

The main result of this section is the following theorem. It implies in par- 
ticular that the causal least mean square error does not depend on the 
resolution of identity which dictates the time structure. 

Theorem 3.1. Within the setup M1-M5, and recalling the notation (2.7), 

(3.1) /(x,y) = i£|u-u^ 
and in the particular case y = ^/tx + w of (2.5), 

(3.2) /(x,y) = |£|x-x^. 

In the classical case Q = Cq[0,T], (3.2) goes back to [1] and the more 
general case (3.1) in which feedback is allowed was obtained in [5]. The new 
contribution here is the full extension of (3.1) to the abstract setup. The 
heart of its proof consists in deriving, in the next subsection, expressions for 
the Radon-Nikodym derivatives appearing in (2.10) from an abstract version 
of Girsanov's formula. The theorem's proof will be finalized in Section 3.2. 

In this context it is worth stating a recently obtained (for linear obser- 
vations) connection between the noncausal error and mutual information. 
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Theorem 3.2 ([3, 13]). In the particular case y = y/yx + w of (2.5) 



dJ(X ' y) =^|x-^|2 



(3.3) ___ = _^_ JL . |F . 

Theorems 3.1 and 3.2 together yield the following interesting connection 
between the causal and noncausal errors (cf. [3] as well). 

Corollary 3.3. For y = + w denote e 2 (^) = E\x - x y \ 2 H and 
e 2 (j)=E\x.-x.y\ 2 H . Then 

~ 2( } = <7? 2 (7)) ; ^ iS; 

( ^) 

7 7^7o 

/n addition, e 2 (^) is a concave function of rj. (We thank an anonymous 
referee who pointed out an error in an earlier version of this statement, and 
in its proof.) 

Proof. The identity (3.4) follows directly from (3.2) and (3.3). As for 
the concavity, denote h(r]) = £ 2 (|) = 2r//(x, y). Then 

=2/ 



h'irj) = 2/(x, y) + 2, ( ^ = 2I(x, y) - li^ and 



_ (_l)^z) _ I - + Ws) _ -!-(«-(!)) < o 

V r] z J dj r] dr)\ \r)J J r/ z r\ ar] \ \rj J ) 

since e 2 (7) is clearly a nonincreasing function of 7. □ 

Remark 3.4. Viewing e 2 as a function of - is equivalent to considering 
the equally natural model y = x + y/rfw instead of y = y/jx + w. 

3.1. Girsanov theorem and Radon-Nikodym derivatives on £1. Through- 
out this subsection, {Qt,0 < t < 1} will be a generic subfiltration of {J-"t, < 
t < 1} typically J\" or as defined in (2.2). First, recall the standard 
Girsanov theorem, in which £1 is the classical Wiener space Co([0,l]). 

Proposition 3.5. Let {b t ,0 < t < 1} be a standard Q.-Brownian mo- 
tion, {at,0 < t < 1} an G. -adapted stochastic process with a. G £ 2 (0, 1) a.s. 
and yt = at + bt,0 < t < 1. Denote 



(3.5) 



A a = exp ^- ^ dt dftt - i ^ d 2 d^ . 
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If EA a = 1 then {yt,0 < t < 1} is a standard Q.-Brownian motion. Equiva- 
lently (the Jacobi change of variable formula) EA a F(y.) =EF(b.), for any 

Fec b (c [o,i]). 

In the context of an abstract Wiener space Ito's integral is denned along 
the same lines as in the classical case. We now proceed to summarize its 
construction and refer the reader to [10], Section 2.6, for a more detailed 
account. 

Definition 3.6. An O-valued random variable v is said to be a Q.- 
abstract Wiener process if, for all /GO*, M t v>t := n{v,ir t l) q* is a zero mean 
(and necessarily continuous, Gaussian) £/. -martingale with quadratic varia- 
tion {M v ' l ) t = \n t l\ 2 H ,0 < t < 1. 

Note that w itself is an ^".'"'-abstract Wiener process. 
Any £/. -abstract Wiener process v generates its associated zero mean 
Gaussian random field {5 v l := ^(v, Z)q*,Z G O*} with covariance structure 

E6 v hS v l 2 = i(En(v,h + l2)l*-En(-v,h-l2)l*) = l(\h + h\ 2 H -\h-h\ 2 H ) = 
(h,h)H which can thus be extended by density to an //-indexed zero mean 
isonormal Gaussian field {5 v h,h G H}. 

As defined in (i), (ii) and (hi) below, the integrator of Ito's integral will be 
a £/. -abstract Wiener processes v, and in (iv) its "semimartingale" extension. 
The integrands, now to be defined, will be (Q., 7r.)-adapted H- valued random 
variables (eventually all of them) . 

(i) For h simple as in (2.3), that is, h = J^kZo a khk with = t < ■ ■ ■ < t n = 
1, 

a k G L 2 (Q,g tk ,P) and h k G (7r ifc+1 -ir tk )(H), define <5 v h = J2k=i akS v h k . 
For any such h 

(3.6) ES v h = 0, E{5 v h) 2 =E\h\%. 

(ii) By (3.6) 5 V can be isometrically extended to the closure in L 2 (P;H) of 
the simple random variables, which turns out to be the set of (Q.,tt.)- 
adapted elements of L 2 (P;H). This extension satisfies (3.6) as well. 

(hi) For any (£/., tt.) -adapted //-valued random variable h, the sequence 
of £/. -stopping times r n = inf{t G [0,1] s.t |-7T(h|^ > n} (inf = 1) in- 
creases to 1 as n — > oo, and <5 v h := lim^^oo (5 v 7r rn h exists almost surely. 

(iv) Whenever z = v + u, where u is an //-valued random variable, define 
5 z h := 5 v h + (u, \i)h for any //-valued random variable h which is 
(£/., 7r.)-adapted. This definition is independent of z's representation. 

In abstract Wiener space, (3.5) becomes, for any (Q., 7r.)-adapted h, 

(3.7) A h :=exp(-5 v h- ^\h\ 2 H ) = exp(-<5 y h + ||h||-). 
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Proposition 3.7 ([10]). Let v be a Q. -abstract Wiener process, h a 
(Q.,tt.)- adapted H-valued random variable, and y = h + v. // EA^ = 1 then 
y is a Q. - abstract Wiener process on (G,.F, A^-P). In particular 

(3.8) EA. h <p(y)=E<p(v) V<^GC7 fe (0). 

Moreover, y 's and v 's image measures fi y and fi v are mutually absolutely 
continuous, and 

l ie -> = AT^y'^y- a - s -' where E{K h \Tl) = K{y).} 

Proof. The Girsanov statement (3.8) is a straightforward generaliza- 
tion of the classical Girsanov theorem (Proposition 3.5), a proof of which 
can be found in [10], Theorem 2.6.3. From (3.8) it follows for all 99 G C&(fi) 
that f n ip(u)/j, v (du) = EA h tp(y) = E\ h (y)ip(y) = f n X h (u)ip(uj)fj,y(duj), and 
thus fi v -C fiy with = Ah, /Uy-a.s. 

Moreover, since A is strictly positive P-a.s., so is Ah,/i y -a.s., and thus 
// v -a.s. as well. This means that fi y ~ fi v and = /U y -a.s. □ 

Although the assumption EA^ = 1 in Proposition 3.7 holds under weaker 
Novikov-type requirements, the following stronger sufficient condition will 
suit our needs. 



Lemma 3.8. Given a Q. -abstract Wiener process v, if h G L°°(P; H) is 
a (Q.,n.)- adapted, then {A^^O < t < 1} is a Q. -martingale. In particular 
EA h = 1. 

Proof. Assume first that h = Ylk=i a hhk is simple, and note that \a^\ < 
M a.s. for some M < 00 and k = 1, . . . ,n, and that EA Wth < Ee M ^ k =i < 
00, for any < t < 1 since 5 v h\, . . . , 5 v h n are Gaussian and independent. To 
show that ^(A^hl^s) = A„- S h for all s < t we may assume without loss 
of generality that s = i m _i and t = t m for some m. In this case A^ h = 

e -Er= 1 K- 5 v^-(i/2)4i^ii f ) and 

^(A^hl^.J 

since a m is Gt m -± -measurable and <5 v h m ~ A^"(0, |h m |^) is independent of 
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If h is (£/., it.) -adapted and |h|# < M < oo a.s., let h n be a sequence of sim- 
ple adapted H- valued random variables such that h n — > h in L 2 (#, J 7 , P; H) 
as n— > oo. Then E(A nt h n \Q s ) = for any n G N and s <t. Clearly 

A-it r h„ — > in probability as n — > oo, for r = s and r = t. Since 

£A^ hn = £A^ 2h „ e l^l 2 H < e M2 EK t 2* n = e M2 

the conditional expectation converges as well and thus E(A nt h\Q s ) = A^h- 
□ 

Proposition 3.9. //, in Proposition 3.7, Q. can be taken to be TY (i.e. 
v is an J-Y -abstract Wiener process) , then fi y and fi v are mutually absolutely 
continuous, with 

(3.10) ^(y)=A h 1 ( =e <5vh+l/2|h|| r=e( 5 y h-l/2|h|| r)) p _ as 

[In this case Ah is T\ -measurable. The point here is that ^ st -(y) = A^ 1 , 
as in (3.9), without requiring EA\^ = 1 a priori.] The following proof is es- 
sentially taken from [10] Theorem 2.4.2 (where y is referred to as an indirect 
shift of v) and adapted here to the abstract Wiener space setup. 

Proof. Define r n = inf{t G [0, 1] s.t |7r t h|# > n} (inf = 1) and let y n = 
h n + v with h n = n Tn h. Since |h n |# < n a.s., Lemma 3.8 guarantees that 
EA\y n = 1 so that it follows from Proposition 3.7 that fi Yn ~ /i v and ^^(yn) = 
Ar 1 a.s., since Au itself is ^"f-measurable, and also 4^-(y n ) = Au. Thus, 
for any (p G C^O), 




since A^ n — ► Ah a.s., and thus by Scheffe's lemma Ah n dP — ► Ah dP in total 
variation. This means that /i v <C /i y and j/jf-(y) = Ah, and since Ah > a.s., 

the reverse is true as well, namely fi y <C /i v and g^(y) = A^ 1 . □ 

The assumption in Proposition 3.9 that v is an ./^-abstract Wiener pro- 
cess suggests that y = h + v should be interpreted as a nonanticipative 
feedback model, and can thus be expected to hold in the case (2.6): 

Proposition 3.10. Assume the setup M1-M5 in Section 2. Then for ^ x 
almost every x G X , w in (2.6) is an FY x -abstract Wiener process, [i yx <C /x w 
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and 



(3.11) ^r^(yx) =exp(5 v ,u x + hu x \ 2 H ), P-a.s. 



s.. 



ot/x w V 2 

For i/ie model (2.5), 

(3.12) ^(y) =exp ^ wU+ I| u |^ ) P . a . 

and in particular 

(3-13) £log^(y) = Ii«. 

Proof. Recall that w is an ^".-abstract Wiener process (cf. Defini- 
tion 3.6). On the other hand, from (2.6) and bearing in mind that the 
mapping U(x, •) is nonanticipative, we conclude that J-^ 1 C J-J x C Tt for all 

<t < 1, so that M t w '' = r2(w,7r^)f7* is not only an .T 7 . -martingale for each 

1 € but also an .F?'* -martingale, and with the same quadratic variation. 
In other words, w is an TY X -abstract Wiener process. 

Thus Proposition 3.9 applies to y^ = u x + w with h = u x and v = w [u^ is 
indeed J- yx -adapted, again by (2.6) and U's nonanticipativity] , and (3.11) 
follows. 

As for (3.12) we first claim that (y) = ^-w(y x )\ x =x- Indeed, for 
any ip £ C b (£l), 

£(V>(y)|x) = F(V>(u x + w)|x)= EiP(u x + w)\ x=x = E^(y x )\ x=x 

(where the independence of x and w was used in the second and last equal- 
ities), from which it follows that /x y | x <C /x w , |U x -a.s., and thus /i y <C // w , 

and moreover (w) = d ^ x (w)|- r=x . By virtue of the absolute continuity 
itself, 

dj ^(y) = dj ^(y) = dj ^{y x ) 

as claimed. Combining this with (3.11), and recalling that u = u^l^x, we 
obtain 

dfA y \ x ( 1 2 \ 

-^-(y) = ex P^(^wU a; )| a . =x + -|u| i? J, P-a.s. 

Note, from the definition of Ito's integral <5 W and the independence of w and 
x, that (5 w u x .)U=x = <5 w u. Thus Flog ^(y) = F5 w u + \E\vl\ 2 h = \E\u\\ 
by (3.6), since \m\h was assumed to have finite second moment. □ 
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Having found an expression for (2.10)s first term based on the repre- 
sentation (2.6), the starting point for the second term is necessarily (2.5). 
However, in order to be able to apply Proposition 3.9 in this case (w is no 
longer an JF 5 '- abstract Wiener process) it is necessary to replace (2.5) by y's 
equivalent innovation representation. 

Lemma 3.11. n := y — u y = (u — u y ) + w is an J 7 ? -abstract Wiener 
process. 

Proof. Let I £ Q*. We need to show that M™' 1 := n(n,ir t l)n* is an 
jF y -martingale with quadratic variation |vr^|^. Indeed, 

E(M?< 1 - M?> l \?J) 

= E( n (n,(n t -n s )l)n*\?J) 

= E( n (u - fi y , - TT s )l) n * + E(E( n (w, ( n - n s )l) n * 

We shall show that both terms above equal zero, assuming without loss of 
generality that s and t are dyadic. The second term is indeed zero since w 
is an ^".-abstract Wiener process. For the first term, denote by A n ,,^y- n the 
cr-algebra generated by the .ff-valued random variables of the form (2.4) on 
the partition V = {£, k = 0, . . . , 2 n } of [0, 1], and 

2 n -l 

U y ' n = ^(ul^^.n) = Yj ^((T(fc+l)/2» - ^fe/2»H^fc/ 2 ")- 

k=0 

It follows from Lemma 2.2 that u y ' n — ► u y in L 2 (P;H), so that it suffices 
to show that E(n(u — u y,n , (irt — vr s )/)n* \3~J) = for every n large enough. 
Denote u k = (vr(fc +1 )/ 2 « — 7r k/2 n ) u and, by the dyadic assumption, s = ^ and 
t = $t. Then 

E( n {u-uy> n ,(7r t -ir s )l)n«\FJ) 
fci-i 

= E E( n {u k -E(u k \^ /2n ),l) n *\F y kQ/2n ) 

k=ko 
fcl-1 

= E E ( E U^k-E(n k \^)J) n .\^)\J% /2a ) = 0. 

k=ko 

As for the quadratic variation, note that Mt n ' 1 = (u — u y , tt^h + M t w ''. By 
Lemma 2.1 the first term is almost surely continuous, has bounded variation 
and thus zero quadratic variation, so that (M a,l )t = (M w '')i = |vrfZ|^. □ 
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Corollary 3.12. 
,3,4) Elog^i^l?,. 

Proof. We may apply Proposition 3.9 to y = u y + n to conclude that 

dfj, w dfi n \ 2 J 

(Indeed, u y is clearly .F. y -adapted and n is an .F?'- abstract Wiener process 
by Lemma 3.11.) Since £(Qy) 2 < Eu 2 < oo, and thus E5 n \i y = 0, (3.15) 
implies (3.14). □ 

3.2. Proof of Theorem 3.1. All that remains to prove (3.1) [and thus 
(3.2) as well] is to insert (3.13) and (3.14) in (2.10) and thus obtain J(x,y) = 
\E\u\l-\E\V\l = \E\vi-V\l. 

4. Gaussian signals. Consider the particular case of (2.5) 

(4.1) y = ^x + w, 

where x is assumed to be a zero mean Gaussian .ff -valued random variable 
with correlation bilinear form r(h, k) = £7(x, fr)#(x, k)jj, h,k £ H, and asso- 
ciated correlation operator R on H characterized by (Rh,k)jj = r(h,k) for 
all h,k 6 H. The positive constant 7 is commonly called the signal to noise 
ratio. 

It is well known that R is nonnegative and of trace class. Its spectrum 
thus consists of a nonincreasing summable sequence {Aj}^ 1 of nonnegative 
eigenvalues with an associated family {¥>j}£^i of orthonormal eigenvectors 
and 

TL = ^2\npi<3ipi, that is, r(h,k) Xi(ipi,h) H (ipi,k) H Vh,keH, 

i=l i=l 

which leads immediately to the representation 

00 

(4.2) x = £v^&ft, mL 2 (P;H) 

i=l 

where {£j = (x, <^i)-ff}^i i s an i-i-d- N(0, 1) sequence. 

Theorem 4.1. The least causal mean square error o/x~ iV(0,R) with 
y as in (4-1) is given by 

00 

(4.3) ^(7) = Sjx - x| 2 = 7- 1 £log(l + Vy) = 7' 1 logdet(J + 7R). 

i=i 
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If x is only assumed to possess a covariance R (but not necessarily to be 
Gaussian), then the right-hand side of (4-3) yields the least linear causal 
mean square error. 

Proof. Expanding y and w in the vectors {<pi}, one obtains m = -J^j^i + io% 
where LOi = iyf^ijH and m = (y,<pi)H are independent for all i. From the 
orthogonality one concludes that 



OO \ 



(4.4) e 2 ( 7 ) = £ m ~ E^\ m )f = £ 

i=i i=i 1 < 

(where the last equality is a standard one-dimensional calculation). Apply- 
ing (3.4) with 70 = we obtain (4.3) as claimed: 

■t OO ~ry \ -l OO 

? 2 (7) = -E/ 7-^7' = -EMI + A i7 ). 
Note that the formulae (4.3) and (4.4) yield the asymptotic expansions in 

? (4 y 



powers of 7 in terms of the coefficients s^ = J2i ( s k are known as R's 



Schatten norms): 



£ 2 (7)~E(- 1 ) fe rXT7 fc and e 2 ( 7 ) ~ E^W 

It is of course not surprising that e 2 (7) ~ e 2 (7) ~si = i?|x| 2 ^ as 7^0. 
A more interesting consequence of these expansions is 



Corollary 4.2. 

]_ 1 1 JL JL 

7-0 E\x\ 2 H - £^(7) 



, £;|x| 2 ff -e 2 (7) 
lim — — — \LL = 2. 



In other words, the noncausal error increases to its limit in small signal 
to noise ratio twice as fast as the causal error, regardless of the correlation 
operator. This is not necessarily true if x is not assumed to be Gaussian. 

The last application of Theorem 4.1 concerns the mean square causal 
estimation error of a stationary multivariate Gaussian process {xt,t £ M} in 
additive white noise. The so called Yovits-Jackson formula for this quantity 
has been obtained in the scalar case under various assumptions and by 
different analytic methods, as explained in the Introduction. Here it follows 
in full generality as a straightforward consequence of Theorem 4.1. 

Proposition 4.3. Let {xt, tsK} be a stationary zero mean n-dimensional 
Gaussian process with continuous correlation function R(t) := Exqx^ G L 1 (M; 
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M nxn ) and spectral density S(£), and let y t = • v /t~ J x s ds + w t , t & R, where 
{wt,t £ R} is a two-sided standard n- dimensional Brownian motion and 
7 > 0. Furthermore, denote by y\ the sigma algebra generated by {yt — y s ,a< 
s <t<b}, for any —oo < a <b< oo. Then, for any fixed time 9, 

/oo 
logdet(I + 7 S(£)R. 
-oo 

Proof. On each finite time interval [0, T], this case can be modeled 
by the classical Wiener space Q = C ([0, T]; M. n ) with w = w., x = f^xtdt 
and y = y. = ^/t~x + w. Let R/r be the Toeplitz integral operator with 

(T) 

kernel R(t — s) and spectrum {A) and It the identity operator, on 

L 2 ([0, T];M n ). By Theorem 4.1, and in view of the stationarity, 



(4.6) 



1 £e\x 9 - E(xe\ y y t )\ 2 dt = E\x t - E(x t \yl)\ 



^ 2 dt 



The integrand in the left-hand side converges, as t — > oo, to the left-hand 
side of (4.5) by standard martingale theory, and thus so does the integral 
average itself. The convergence of right-hand side is a consequence of a 
matrix- valued version of the Kac-Murdock-Szego theorem on R^'s asymp- 
totic eigenvalue distribution (see [4], Section 4.4 or [2], page 139). Specifi- 
cally, [6] Theorem 3.2, states (formula (3.2) in [6] contains a typographical 
error; the integrand there should be ti$>(K(t)), as is evident throughout 
the subsequent proof) that as T — > oo the term in parenthesis converges 
to T^f f^oo logdet(7 + jS(£)) d£ which concludes the proof. (The cited theo- 
rem was applied to the function $(z) = log(l + 72), z G [0, .E|xo| 2 ], which is 
allowed in view of Remark 3.2 in [6].) □ 
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